R on High-Performance Computing (HPC)
Audience: Clinical researchers, biologists, and chemists new to HPC scripting.
Goal: Show how R can be used efficiently on HPC systems like the St. Jude HPCF.
2025-01-08
Audience: Clinical researchers, biologists, and chemists new to HPC scripting.
Goal: Show how R can be used efficiently on HPC systems like the St. Jude HPCF.
Think of HPC as a gourmet kitchen with many chefs.
Key Concept: Split your work so many nodes can help.
A CPU, or Central Processing Unit, is the main “brain” of a computer. A CPU core is a single processing unit within that CPU.
Think of the CPU as a house and the cores as rooms within that house. Each core can independently execute instructions, allowing the CPU to handle multiple tasks simultaneously.
Modern CPUs often have multiple cores (dual-core, quad-core, etc.), enabling greater multitasking capabilities.
ssh your_username@hpc.stjude.org
bsub < job.sh # submit a job bjobs # check job status bqueues # view queue info bkill <jobID> # kill a job
Example:
interactive queue: Allows interactive sessions like RStudiogpu_priority: For GPU-enabled jobs, with higher urgencycompbio: A queue shared by Computational Biology groupspriority: A queue with expedited schedulinglarge_mem: Reserved for high-RAM needs (e.g., 500+ GB)| Example | Description | Mode | Why it Matters |
|---|---|---|---|
| ✅ Small R Script | Read & summarize cancer dataset | Interactive & Batch | Illustrates two job submission styles |
| 🚀 Bootstrap | Cox models on 1000 bootstraps | Batch + Parallel | Shows clear speed-up using HPC |
| 🧠 GPU Matrix Mult | Matrix product on GPU | GPU + Batch | Showcases GPU for numeric computing |
$ bsub -P survival_analysis -q interactive -Is bash $ module avail R $module load R/4.4.0 $R
# Inside R session library(dplyr) library(survival) data(cancer, package = "survival") head(force(cancer))
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss ## 1 3 306 2 74 1 1 90 100 1175 NA ## 2 3 455 2 68 1 0 90 90 1225 15 ## 3 3 1010 1 56 1 0 90 90 NA 15 ## 4 5 210 2 57 1 1 90 60 1150 11 ## 5 1 883 2 60 1 0 100 90 NA 0 ## 6 12 1022 1 74 1 1 50 80 513 0
# inst: Institution code # time: Survival time in days # status: censoring status 1=censored, 2=dead # age: Age in years # sex: Male=1 Female=2 # ph.ecog: ECOG performance score as rated by the physician. 0=asymptomatic, 1= symptomatic but completely ambulatory, 2= in bed <50% of the day, 3= in bed > 50% of the day but not bedbound, 4 = bedbound # ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician # pat.karno: Karnofsky performance score as rated by patient # meal.cal: Calories consumed at meals # wt.loss: Weight loss in last six months (pounds) results <- cancer %>% group_by(sex) %>% summarise(survival_rate = mean(time)) print(results)
## # A tibble: 2 × 2 ## sex survival_rate ## <dbl> <dbl> ## 1 1 283. ## 2 2 339.
# write.csv(results, "../../output/survival_analysis_interactive.csv")
survival_analysis.R
job1.sh
#!/bin/bash #BSUB -n 1 #BSUB -q priority #BSUB -W 00:10 #BSUB -R "rusage[mem=2048]" #BSUB -J survival_analysis #BSUB -o logs/output.%J.log #BSUB -e logs/error.%J.log BASE_DIR=/research/rgs01/home/clusterHome/zqu/workshop07252025/code/ module load R/4.4.0 Rscript R/survival_analysis.R
parallel_bootstrap.R
library(survival)
library(doParallel)
library(foreach)
data(cancer, package = "survival")
# ----- 1. Parallel Execution -----
start_parallel <- Sys.time()
cl <- makeCluster(8)
registerDoParallel(cl)
coefs_parallel <- foreach(i = 1:10000, .combine = rbind, .packages = c("survival")) %dopar% {
data(cancer, package = "survival")
samp <- cancer[sample(nrow(cancer), replace = TRUE), ]
coef(coxph(Surv(time, status) ~ age + sex, data = samp))
}
stopCluster(cl)
end_parallel <- Sys.time()
parallel_time <- end_parallel - start_parallel
time_value <- as.numeric(parallel_time)
time_unit <- attr(parallel_time, "units")
cat(sprintf("⏱️ Time taken (parallel): %.2f %s\n", time_value, time_unit))
# ----- 2. Sequential Execution -----
start_seq <- Sys.time()
coefs_seq <- matrix(NA, nrow = 10000, ncol = 2)
for (i in 1:nrow(coefs_seq)) {
samp <- cancer[sample(nrow(cancer), replace = TRUE), ]
fit <- coxph(Surv(time, status) ~ age + sex, data = samp)
coefs_seq[i, ] <- coef(fit)
}
end_seq <- Sys.time()
sequential_time <- end_seq - start_seq
time_value <- as.numeric(sequential_time)
time_unit <- attr(sequential_time, "units")
cat(sprintf("⏱️ Time taken (sequential): %.2f %s\n", time_value, time_unit))
# Optional: Save outputs if needed
write.csv(coefs_parallel, "../output/bootstrap_coefs_parallel.csv", row.names = FALSE)
write.csv(coefs_seq, "../output/bootstrap_coefs_sequential.csv", row.names = FALSE)
bootstrap_job.sh
#BSUB -J r_parallel_bootstrap #!/bin/bash #BSUB -q priority #BSUB -n 10 #BSUB -R "span[hosts=1]" # All cores on one node #BSUB -W 00:30 # Runtime limit hh:mm #BSUB -R "rusage[mem=100000]" #BSUB -o logs/output.%J.log #BSUB -e logs/error.%J.log BASE_DIR=/research/rgs01/home/clusterHome/zqu/workshop07252025/code/ module load R/4.4.0 Rscript R/bootstrap_parallel_hpc.R
| Feature | CPU | GPU |
|---|---|---|
| Cores | Few (2–64) | Hundreds to thousands |
| Task Type | Sequential tasks | Parallelizable tasks |
| Latency | Low (good for logic) | Higher (good for throughput) |
| Memory Hierarchy | Complex, flexible | Simpler, faster bandwidth |
| Best For | General-purpose processing | Large-scale matrix operations |
CPU Parallel (doParallel) |
GPU Parallel (gpuR) |
|
|---|---|---|
| Parallelism Type | Multi-core | Many-core (SIMD) |
| Ideal Use Case | Independent iterations | Matrix ops, deep learning |
| Setup Complexity | Low to Medium | Medium (GPU config required) |
| Speed Improvement | Moderate | High (if vectorized) |
| Limitations | Memory bottlenecks | Data transfer & library limits |
Without SIMD (CPU style): You make one cookie at a time — scoop dough, shape, bake, repeat.
With SIMD (GPU style): You have a tray with 100 molds and pour dough into all of them at once — one instruction, multiple cookies baked together.
bsub -q gpu_interactive -n 1 -Is bash module load R/4.1.2
library(gpuR)
library(microbenchmark)
# Matrix dimensions
n <- 1000
# GPU inversion
gpu_time <- microbenchmark({
# 1. Generate data in CPU memory
x <- matrix(rnorm(n^2), nrow = n)
# 2. Transfer to GPU
mat_gpu <- gpuMatrix(x, type = "float")
# 3. Perform operations
inv_gpu <- solve(mat_gpu)
}, times = 1)
cat(sprintf("🧠 GPU Time: %.2f %s\n", gpu_time$time[1]/1e9, "seconds"))
GPU Time: 1.51 seconds
CPU Time: 1.89 seconds
gpu_inverse_job.R
gpu_job.sh
#!/bin/bash #BSUB -J gpu_inv_array[1-10] # Job array: 10 parallel jobs #BSUB -q gpu_priority # Use appropriate GPU queue #BSUB -gpu "num=1" # Request 1 GPU per job #BSUB -n 1 # 1 CPU core is enough #BSUB -R "rusage[mem=1GB]" # Memory per job #BSUB -W 0:15 # Walltime #BSUB -o log/gpu_job_%J_%I.out # Stdout per array task #BSUB -e log/gpu_job_%J_%I.err # Stderr per array task BASE_DIR=/research/rgs01/home/clusterHome/zqu/workshop07252025/code/ cd $BASE_DIR # Load R module (adjust version as needed) module load R/4.1.2 # Run R script Rscript R/gpu_inverse_job.R
✅ Test on small data first
✅ Use vectorized & parallel functions
✅ Monitor jobs and scale accordingly
✅ Request only what you need
✅ Use job arrays or packages like clustermq, batchtools
You can now run R jobs smarter and faster on HPC.
Let the cluster cook for you 🍽️!